Segmenting and Clustering Neighborhoods in São Paulo

Table of Contents

Introduction/Business Problem

Brazil is an underdevelopment country and São Paulo is Brazil`s biggest city and is one of the most important economical centers of Latin America. As the economic growth pushes the city development, it changes the way is occupied and creates investment and better living opportunities.

The goal of this project is to offer a understanding of the city different regions and segmenting it in a way that it can be used to make a decision on where to invest (by opening a new restaurant, for example) or choosing a place to live based on the kind of places available around.

Data

In this project we are going to divide the city into its districts, so we are using the geo information available at the GeoSampa website, which is an official government tool with public data.

http://geosampa.prefeitura.sp.gov.br/PaginasPublicas/_SBC.aspx#

Once we download the data from the website, we need to convert it into a geojson file with the correct coordinate system. For that process, we are going to use the MyGeodata website tool, available at https://mygeodata.cloud/.

Below, we created a Choropleth map using the library Folium to show how the districts are divided.

In [5]:
import folium

sp_districts = folium.Map(location=[-23.5489, -46.6388], zoom_start=10)

folium.Choropleth(
    geo_data="LL_WGS84_KMZ_distrito.geojson"
).add_to(sp_districts)

folium.LayerControl().add_to(sp_districts)

sp_districts
Out[5]:
Make this Notebook Trusted to load map: File -> Trust Notebook

To retrieve and group city venues data, we are going to use the Foursquare API, available for free at: https://developer.foursquare.com/places

With the latitude and longitude information of each district, we are going to call the API and retrieve the venues data and use it to segment the districts.